The goals / steps of this project are the following:
The images for camera calibration are stored in the folder called camera_cal. The images in test_images are for testing your pipeline on single frames. The video called project_video.mp4 is the video your pipeline should work well on. challenge_video.mp4 is an extra (and optional) challenge for you if you want to test your pipeline.
If you're feeling ambitious (totally optional though), don't stop there! We encourage you to go out and take video of your own, calibrate your camera and show us how you would implement this project from scratch!
I have spent much more time on this project than any previous ones. It has been a good practice for me to get familiar with many computer vision concepts and techniques, such as camera calibration and perspective transform. But to be frank, I am still not convinced that the algorithms used here are general and robust enough to be useful in practical self-driving, with the main reasons being:
A lot of literature on lane detection by traditional computer vision can be found in previous publications, but I think most of the focuses have been shifted toward more data-driven approaches recently. I hope there will be more advanced techniques covered in the later part of the course to address those shortcomings.
sdclane, with main functions implemented in the following files:camera.py: camera calibrationline_detection.py: line detection by several methods such as sobel in different color spacestransform.py: perspective transform to get bird-eye view of laneslane_detection.py: main class LaneDetector builds pipeline from previous steps to detect/estimate lanes in images and videos.config.py: configuration such as images for camera calibration and testingutility.py: helper functions for pipeline and image io%matplotlib inline
from sdclane import config, utility, camera, line_detection, transform, lane_detection
import cv2
import matplotlib.pyplot as plt
import numpy as np
from moviepy.editor import VideoFileClip
from IPython.display import HTML
np.random.seed(1337)
OpenCV functions or other methods were used to calculate the correct camera matrix and distortion coefficients using the calibration chessboard images provided in the repository. The distortion matrix should be used to un-distort the test calibration image provided as a demonstration that the calibration is correct.
Image undistortion is implemented in sdclane.camera.CameraCalibrator class.
fit method uses a set of chessboard_images and estimates distortion matrix and coefficients as self.M and self.dundistort method takes a raw image and undistorts it accordinglycamera_cal folder have different specifications e.g., # of row/col corners. Those are simply ignored.
The results of undistortion will be shown below.## helper to display image grid
def grid_plot(image_cols):
ncols = len(image_cols)
nrows = len(image_cols[0][1])
fig, axes = plt.subplots(nrows, ncols, figsize = (8*ncols, 4*nrows))
fig.tight_layout()
fig.subplots_adjust(wspace = 0.1, hspace=0.1, )
for r, ax in enumerate(axes):
for c, (colname, imgs) in enumerate(image_cols):
img = imgs[r]
cmap = plt.cm.gray if img.ndim < 3 else None
ax[c].imshow(img, cmap=cmap)
ax[c].set_axis_off()
ax[c].set_title(colname)
## load the test images
test_imgs = utility.read_rgb_imgs(config.test_img_files)
Distortion correction that was calculated via camera calibration has been correctly applied to each image.
The results of distortion correction are shown below. In details, sdclane.camera.CameraCalibrator.fit() method reads a set of chessboard images from camera_cal/ folder (assuming this is from the same camera), where cv2.calibrateCamera is called to estimate the calibration matrix and coefficient. Later these esimated parameters are used in undistort() method.
Visually there is some right shift in the undistorted images.
undistort = camera.build_undistort_function()
undistorted_imgs = list(map(undistort, test_imgs))
grid_plot( [("original", test_imgs),
("undistorted", undistorted_imgs)])
At least two methods (i.e., color transforms, gradients) have been combined to create a binary image containing likely lane pixels. There is no "ground truth" here, just visual verification that the pixels identified as part of the lane lines are, in fact, part of the lines.
There are two steps implemented for line detection:
lines_with_gradx AND (left_line OR right_line).L and S channel of HLS images are specially good at detecting bright lines in spite of color changes and shadows.This is implemented in sdclane.line_detection.LineDetector.detect()
The results of the pipe are shown below.
detect_line = line_detection.LineDetector().detect
roi_crop = transform.build_trapezoidal_bottom_roi_crop_function()
line_imgs = list(map(detect_line, undistorted_imgs))
roi_line_imgs = list(map(roi_crop, line_imgs))
grid_plot([("undistorted", undistorted_imgs),
("lines", line_imgs),
("lines in ROI", roi_line_imgs)])
OpenCV function or other method has been used to correctly rectify each image to a "birds-eye view"
The perspective transform is implemented in sdclane.transform.PerspectiveTransformer class, in several steps:
sdclane.config package.ransac model to estimate a robust model of lines for each.cv2.getPerspectiveTransform on the trapzoid and rectangle.x_mpp and y_mpp so later you can use them to estimate other parameters such as curvatures and center offsets.x_mpp is relatively straightforward. We assume the width of the lane is always 3 meters and x_mpp is the ratio of lane width w.r.t the width of the target rectangle in warped space.y_mpp in a slightly different way from in the clas - I chose the longest segment of the dotted lane and assumed it to be 3 meters in reality (as suggested in the class). This gives different curvature and offset estimate later on, but I am not really sure which is more (or less) accurate because the method used in the class is also quite ad hoc.PerspectiveTransformer.transform() and PerspectiveTransformer.binary_transform().The results of the transform are shown below. Visually the lanes in the bird-eye view are clear and roughly parallel.
transformer = transform.build_default_warp_transformer()
warped_imgs = list(map(transformer.transform, test_imgs))
grid_plot([("undistorted", undistorted_imgs),
("birdeye view", warped_imgs)])
Methods have been used to identify lane line pixels in the rectified binary image. The left and right line have been identified and fit with a curved functional form (e.g., spine or polynomial).
The same techniques can be used to detect the lane pixels in the warped images. However, I choose to directly transform the line pixels from original image space to the bird-eye view space, by using PerspectiveTransformer.binary_transform(). This is based on the observations that (1) the lane detection in original images are already visually good (2) in the bird-eye view the lanes are still clear, and (3) it makes the code simpler.
The results of the lane pixels in the bird-eye view are shown below. We can see there are some noises in the final lane images, which need to be removed before parameter estimation.
lane_imgs = list(map(transformer.binary_transform, roi_line_imgs))
grid_plot([
("undistorted", undistorted_imgs),
("lanes in original space", roi_line_imgs),
("lanes in bird-eye view", lane_imgs)
])
Here the idea is to take the measurements of where the lane lines are and estimate how much the road is curving and where the vehicle is located with respect to the center of the lane. The radius of curvature may be given in meters assuming the curve of the road follows a circle and the position of the vehicle within the lane may be given as meters off of center.
Now we have the lane pixels in bird-eye view and meter-per-pixel for both x and y, estimating the curvature and the center offset is straightforward. The whole process is implemented in sdclane.lane_detection.LaneDetector.detect_image():
LaneDetector.get_lane_pixels().LaneDetector.get_lane_pixels().LaneDetector.estimate_lane_params(). To calculate these parameter values in reality, the previously estimated meter-per-pixel x_mpp and y_mpp from transform are used.The fit from the rectified image has been warped back onto the original image and plotted to identify the lane boundaries. This should demonstrate that the lane boundaries were correctly identified.
The whole pipeline from a camera image, to undistorted, to lane detection, to bird-eye view, and finally parameter estimation and visual check, is implemented in the sdclane.lane_detection.LaneDetector.test_image() method.
The final results of the pipeline are depicted below.
# build lane detector
lane_detector = lane_detection.LaneDetector()
lane_estimates = [lane_detector.detect_image(img)[1] for img in test_imgs]
grid_plot([
("camera images", test_imgs),
("lane estimates", lane_estimates)
])
The image processing pipeline that was established to find the lane lines in images successfully processes the video. The output here should be a new video where the lanes are identified in every frame, and outputs are generated regarding the radius of curvature of the lane and vehicle position within the lane. The identification and estimation don't need to be perfect, but they should not be wildly off in any case. The pipeline should correctly map out curved lines and not fail when shadows or pavement color changes are present.
The same lane detection pipeline for videos are also implemented in sdclane.lane_detection.LaneDetector class, under the LaneDetector.detect_video() method. The result will be shown below.
In the first few frames of video, the algorithm should perform a search without prior assumptions about where the lines are (i.e., no hard coded values to start with). Once a high-confidence detection is achieved, that positional knowledge may be used in future iterations as a starting point to find the lines.
LaneDetector.detect_video() method is implemented in a way that
test_image() method) if there is no estimates from the previous frame available.LaneDetector.process_frame() method.As soon as a high confidence detection of the lane lines has been achieved, that information should be propagated to the detection step for the next frame of the video, both as a means of saving time on detection and in order to reject outliers (anomalous detections).
In details, the "faster search" based on tracking of last frame works as follows,
This is implemented in LaneDetector.process_frame(). There are many heuristic-based parameters that hard-code the detection algorithm, such as sliding window size and etc. I am not confident at all whether it will work on new scenarios.
The result on project_video.mp4 is shown below. The algorithm works partially on the two challenge vidoes, when certain assumptions made in the code are met in the videos. However I didn't go further to modify the code to work on these challenges. As mentioned above, I am not really convinced by the material in this project, so even it succeeds on the challenge videos, I have no confidence at all that it will work on new scenarios.
clip_output_file = 'marked_project_video.mp4'
clip = VideoFileClip("project_video.mp4")
clip_output = lane_detector.detect_video(clip)
%time clip_output.write_videofile(clip_output_file, audio=False)
HTML("""
<video width="960" height="540" controls>
<source src="{0}">
</video>
""".format('marked_project_video.mp4'))
The Readme file submitted with this project includes a detailed description of what steps were taken to achieve the result, what techniques were used to arrive at a successful result, what could be improved about their algorithm/pipeline, and what hypothetical cases would cause their pipeline to fail.
I have explained the general steps in the pipeline above. More details can be found in the reference to the code. Comments in codes are given across the files when necessary.
I have also shared my thoughts on why the method used here may not be satisfying in practice. And practically, my implementation is not really fast enough for real-time lane detections.
The detection of lanes are quite smooth across successive frames because a smoothing method has been implemented to take the moving averages of estimations. However the middle curve is still a little bumpy - modifying the weight coefficient might help in this case, but again it is another parameter that is based on heuristic and may not be general enough for new cases.
With these said, I have learned a lot of knowledge about computer visions through the project, on that I think I have achieved the goal!